An Ontology-Based HTML to XML Conversion Using Intelligent Agents

نویسندگان

  • Thomas E. Potok
  • Mark T. Elmore
  • Joel W. Reed
  • Nagiza F. Samatova
چکیده

How to organize and classify large amounts of heterogeneous information accessible over the Internet is a major problem faced by industry, government, and military organizations. XML is clearly a potential solution to this problem, [1,2] however, a significant challenge is how to automatically convert information currently expressed in a standard HTML format to an XML format. Within the Virtual Information Processing Agent Research (VIPAR) project, we have developed a process using Internet ontologies and intelligent software agents to perform automatic HTML to XML conversion for Internet newspapers. The VIPAR software is based on a number of significant research breakthroughs. Most notably, the ability for intelligent agents to use a flexible RDF ontology to transform HTML documents to XML tagged documents. The VIPAR system is currently deployed at the US Pacific Command, Camp Smith, HI, traversing up to 17 Internet newspapers daily. 1 The Virtual Information Processing Agent Research (VIPAR) is a software research and development project funded by the Office of Naval Research (ONR) via Interagency Agreement 2302-Q326-A1 with the Department of Energy’s Oak Ridge National Laboratory (ORNL). The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464. Accordingly, the U.S. Government retains a non-exclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government Purposes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Representation on the Internet: Achieving Interoperability in a Dynamic, Distributed Environment

The Internet’s explosive growth is making it harder and harder to harness its potential. There is so much information available that users are frequently overwhelmed by information overload. Due to limitations in modern natural language processing, an important part of search involves keyword-based techniques, which tend to have poor precision and recall. Some systems use the format of a web pa...

متن کامل

OIL: An Ontology Infrastructure for the Semantic Web

have investigated them. More recently, the notion of an ontology is becoming widespread in fields such as intelligent information integration, cooperative information systems, information retrieval, electronic commerce, and knowledge management. Ontologies are becoming popular largely because of what they promise: a shared and common understanding that reaches across people and application syst...

متن کامل

The Semantic Web and Cultural Heritage: Ontologies and Technologies Help in Accessing Museum Information

A virtual museum should support rich semantic associations. In the past much effort has been devoted to match scholar needs, implementing appropriate aids. However, efforts toward a unified schema have all failed. Integration is often attempted at metadata level, but a more useful effort is to attempt to create a ”core ontology” which incorporates basic entities and relationships common across ...

متن کامل

From XML to Semantic Web

The present web is existing in the HTML and XML formats for persons to browse. Recently there is a trend towards the semantic web where the information can be can be processed and understood by agents. Most of the present research works focus on the translation from HTML to semantic web, but seldom on XML. In this paper, we design a method to translate XML to semantic web. It is known that onto...

متن کامل

An Ontology-Based Content Model for Intelligent Web Content Access Services

Intelligent Web content access is a fundamental Web service, representing the first step toward semantic Web services. A lack of adequate and sufficient interpretation for content in current methods impedes access to content. This study regards Web content as any content described and published in the format of a markup language such as HTML or XML. In this paper, we will present our Content Mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002